PacificA: Replication in Log-Based Distributed Storage Systems
نویسندگان
چکیده
Large-scale distributed storage systems have gained popularity for storing and processing ever increasing amount of data. Replication mechanisms are often key to achieving high availability and high throughput in such systems. Research on fundamental problems such as consensus has laid out a solid foundation for replication protocols. Yet, both the architectural design and engineering issues of practical replication mechanisms remain an art. This paper describes our experience in designing and implementing replication for commonly used log-based storage systems. We advocate a general replication framework that is simple, practical, and strongly consistent. We show that the framework is flexible enough to accommodate a variety of different design choices that we explore. Using a prototype system called PacificA, we implemented three different replication strategies, all using the same replication framework. The paper reports detailed performance evaluation results, especially on system behavior during failure, reconciliation, and recovery.
منابع مشابه
E2DR: Energy Efficient Data Replication in Data Grid
Abstract— Data grids are an important branch of gird computing which provide mechanisms for the management of large volumes of distributed data. Energy efficiency has recently emerged as a hot topic in large distributed systems. The development of computing systems is traditionally focused on performance improvements driven by the demand of client's applications in scientific and business domai...
متن کاملImproving Data Grids Performance by Using Modified Dynamic Hierarchical Replication Strategy
Abstract: A Data Grid connects a collection of geographically distributed computational and storage resources that enables users to share data and other resources. Data replication, a technique much discussed by Data Grid researchers in recent years creates multiple copies of file and places them in various locations to shorten file access times. In this paper, a dynamic data replication strate...
متن کاملData Replication-Based Scheduling in Cloud Computing Environment
Abstract— High-performance computing and vast storage are two key factors required for executing data-intensive applications. In comparison with traditional distributed systems like data grid, cloud computing provides these factors in a more affordable, scalable and elastic platform. Furthermore, accessing data files is critical for performing such applications. Sometimes accessing data becomes...
متن کاملA Non-MDS Erasure Code Scheme for Storage Applications
This paper investigates the use of redundancy and self repairing against node failures indistributed storage systems using a novel non-MDS erasure code. In replication method, accessto one replication node is adequate to reconstruct a lost node, while in MDS erasure codedsystems which are optimal in terms of redundancy-reliability tradeoff, a single node failure isrepaired after recovering the ...
متن کاملAn Efficient Data Replication Strategy in Large-Scale Data Grid Environments Based on Availability and Popularity
The data grid technology, which uses the scale of the Internet to solve storage limitation for the huge amount of data, has become one of the hot research topics. Recently, data replication strategies have been widely employed in distributed environment to copy frequently accessed data in suitable sites. The primary purposes are shortening distance of file transmission and achieving files from ...
متن کامل